Improving Chinese Storing Text Retrieval Systems' Security via a Novel Maximal Prefix Coding
نویسندگان
چکیده
As we have seen that Huffman coding has been widely used in data, image, and video compression. In this paper novel maximal prefix coding is introduced. Relationship between the Huffman coding and the optimal maximal prefix coding are discussed. We show that all Huffman coding schemes are optimal maximal prefix coding schemes and that conversely the optimal maximal prefix coding schemes need not to be the Huffman coding schemes. Moreover, it is proven that, for any maximal prefix code C, there exists an information source I = (∑, P) such that C is exactly a Huffman code for I. Therefore, it is essential to show that the class of Huffman codes is coincident with one of maximal prefix codes. A case study of data compression is also given. Comparing the Huffman coding, the maximal prefix coding is used for not only statistical modeling but also dictionary methods. And it is good at applying to a large information retrieval system and improving its security.
منابع مشابه
Improving Information Retrieval System Security via an Optimal Maximal Coding Scheme
Novel maximal coding compression techniques for the most important file-the text file of any full-text retrieval system are discussed in this paper. As a continuation of our previous work, we show that the optimal maximal coding schemes coincide with the optimal uniquely decodable coding schemes. An efficient algorithm generating an optimal maximal code (or an optimal uniquely decodable code) i...
متن کاملImproving Semistatic Compression Via Pair-Based Coding
In the last years, new semistatic word-based byte-oriented compressors, such as Plain and Tagged Huffman and the Dense Codes, have been used to improve the efficiency of text retrieval systems, while reducing the compressed collections to 30–35% of their original size. In this paper, we present a new semistatic compressor, called Pair-Based End-Tagged Dense Code (PETDC). PETDC compresses Englis...
متن کاملOptimal Maximal Prefix Coding and Huffman Coding
Huffman coding has been widely used in data, image, and video compression. Novel maximal prefix coding different from the Huffman coding is introduced. Relationships between the Huffman coding and optimal maximal prefix coding are discussed. We show that all Huffman coding schemes are maximal prefix coding schemes and have the shortest average code word length among maximal prefix coding scheme...
متن کاملApplication of the Tightness Continuum Measure to Chinese Information Retrieval
Most word segmentation methods employed in Chinese Information Retrieval systems are based on a static dictionary or a model trained against a manually segmented corpus. These general segmentation approaches may not be optimal because they disregard information within semantic units. We propose a novel method for improving word-based Chinese IR, which performs segmentation according to the tigh...
متن کاملPrivate Key based query on encrypted data
Nowadays, users of information systems have inclination to use a central server to decrease data transferring and maintenance costs. Since such a system is not so trustworthy, users' data usually upkeeps encrypted. However, encryption is not a nostrum for security problems and cannot guarantee the data security. In other words, there are some techniques that can endanger security of encrypted d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Comput. Proc. Oriental Lang.
دوره 15 شماره
صفحات -
تاریخ انتشار 2002